A Comparative Study on Performance Benefits of Multi-core CPUs using OpenMP

نویسندگان

Vijayalakshmi Saravanan

Mohan Radhakrishnan

چکیده

Achieving scalable parallelism from general programs was not successful to this point. To extract parallelism from programs has become the key focus of interest on multi-core CPUs. There are many techniques and programming models such as MPI, CUDA and OpenMP adopted in order to exploit more performance. But there is an urge to find the best parallel programming techniques for the benefit of performance. This article shows how the performance potential benefits the parallel programming model over sequential programming model. To support our claim, we are likely to analyze the performance in terms of execution time on both sequential and parallel implementations of naive matrix multiplication vs. Strassen’s matrix multiplication algorithm using OpenMP. Our analysis results show that optimizing the code using OpenMP increases the performance than sequential implementation and outperforming well with parallel algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Parallel Algorithms for the Girth Problem

In this paper we introduce efficient parallel algorithms for finding the girth in a graph or digraph, where girth is the length of a shortest cycle. We empirically compare our algorithms by using two common APIs for parallel programming in C++, which are OpenMP for multiple CPUs and CUDA for multi-core GPUs. We conclude that both hardware platforms and programming models have their benefits.

متن کامل

Parallel Protein Structure Alignment: A Comparative Study of Two Parallel Programming Paradigms

Protein 3D structure alignment process has become the key focus of interest in structural bioinformatics. Yet, obtaining perfect alignment in a short execution time was not successful to this point. To overcome this problem, researchers tend to use parallel programming techniques to enhance the performance of the alignment process. In this article, we compare between two parallel programming pa...

متن کامل

A High-performance Brownian Bridge for Gpus: Lessons for Bandwidth Bound Applications

We present a very flexible Brownian bridge generator together with a GPU implementation which achieves close to peak performance on an NVIDIA C2050. The performance is compared with an OpenMP implementation run on several high performance x86-64 systems. The GPU shows a performance gain of at least 10x. Full comparative results are given in Section 8: in particular, we observe that the Brownian...

متن کامل

OpenCL on shared memory multicore CPUs

Shared memory multicore processor technology is pervasive in mainstream computing. This new architecture challenges programmers to write code that scales over these many cores to exploit the full computational power of these machines. OpenMP and Intel Threading Building Blocks (TBB) are two of the popular frameworks used to program these architectures. Recently, OpenCL has been defined as a sta...

متن کامل